Finding Sequence Features in Tissue-specific Sequences

نویسندگان

  • Arvind Rao
  • Alfred O.Hero
  • David J. States
  • James Douglas Engel
چکیده

The discovery of motifs underlying gene expression is a challenging one. Some of these motifs are known transcription factors, but sequence inspection often provides valuable clues, even discovery of novel motifs with uncharacterized function in gene expression. Coupled with the complexity underlying tissuespecific gene expression, there are several motifs that are putatively responsible for expression in a certain cell type. This has important implications in understanding fundamental biological processes, such as development and disease progression. In this work, we present an approach to the principled selection of motifs (not necessarily transcription factor sites) and examine its application to several questions in current bioinformatics research. There are two main contributions of this work: Firstly, we introduce a new metric for variable selection during classification , and secondly, we investigate a problem of finding specific sequence motifs that underlie tissue specific gene expression. In conjunction with the SVM classifier we find these motifs and discover several novel motifs which have not yet been attributed with any particular functional role (eg: TFBS binding motifs). We hypothesize that the discovery of these motifs would enable the large-scale investigation for the tissue specific regulatory potential of any conserved sequence element identified from genome-wide studies. Finally, we propose the utility of this developed framework to not only aid discovery of discriminatory motifs, but also to examine the role of any motif of choice in co-regulation or coexpression of gene groups.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

شناسایی RNA های غیرکدکننده کوتاه ‌عملکردی با استفاده از روش های بیوانفورماتیکی در گوسفند و بز

MicroRNAs (miRNAs) are small non-coding RNAs that have functional roles in post-transcriptional modification. They regulate gene expression by an RNA interfering pathway through cleavage or inhibition of the translation of target mRNA. Numerous miRNAs have been described for their important functions in developmental processes in numerous animals, but there is limited information about sheep an...

متن کامل

The roles of EPIYA sequence to perturb the cellular signaling pathways and cancer risk

Abstract It was shown that several pathogenic bacterial effector proteins contain the Glu-Pro-Ile-Tyr-Ala (EPIYA) or a similar sequence. These bacterial EPIYA effectors are delivered into host cell via type III or IV secretion system, where they undergo tyrosine phosphorylation at the EPIYA sequences, which triggers interaction with multiple host cell SH2 domain-containing proteins and thereby...

متن کامل

Detection of single Dactylogyrus spp. in DNA extracted from infected gill tissue of fishes using Polymerase Chain Reaction

Dactylogyrus spp. are monogenean worms found mostly as ectoparasites on the gills of several fish species, including carp and goldfish. These parasites are commonly detected by microscopic analysis of the gill lamellae, but this is time-consuming and technically difficult. In contrast to this conventional method, molecular techniques provide specific, sensitive and safe detection of parasites. ...

متن کامل

P-215: Discovery of A Novel APA Variant of A Human Potential Gene Based on Expressed Sequenced Tags Analysis

Background: Expressed sequence tags (ESTs) are sequences of cDNA fragments prepared from different tissue sources. There are over one million of these sequences in the publicly available database, and these sequences are believed to represent more than half of all human genes. The ESTs belong to different cDNA libraries, was prepared from one particular cell type, organ, or tumor. Therefore, th...

متن کامل

An Application of the ABS LX Algorithm to Multiple Sequence Alignment

We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...

متن کامل

A Novel Intrusion Detection Systems based on Genetic Algorithms-suggested Features by the Means of Different Permutations of Labels’ Orders

Intrusion detection systems (IDS) by exploiting Machine learning techniques are able to diagnose attack traffics behaviors. Because of relatively large numbers of features in IDS standard benchmark dataset, like KDD CUP 99 and NSL_KDD, features selection methods play an important role. Optimization algorithms like Genetic algorithms (GA) are capable of finding near-optimum combination of the fe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007